Hepatology
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- North America > United States > North Carolina (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > India (0.04)
ContextualSHAP : Enhancing SHAP Explanations Through Contextual Language Generation
Dwiyanti, Latifa, Wibisono, Sergio Ryan, Nambo, Hidetaka
Explainable Artificial Intelligence (XAI) has become an increasingly important area of research, particularly as machine learning models are deployed in high-stakes domains. Among various XAI approaches, SHAP (SHapley Additive exPlanations) has gained prominence due to its ability to provide both global and local explanations across different machine learning models. While SHAP effectively visualizes feature importance, it often lacks contextual explanations that are meaningful for end-users, especially those without technical backgrounds. To address this gap, we propose a Python package that extends SHAP by integrating it with a large language model (LLM), specifically OpenAI's GPT, to generate contextualized textual explanations. This integration is guided by user-defined parameters (such as feature aliases, descriptions, and additional background) to tailor the explanation to both the model context and the user perspective. We hypothesize that this enhancement can improve the perceived understandability of SHAP explanations. To evaluate the effectiveness of the proposed package, we applied it in a healthcare-related case study and conducted user evaluations involving real end-users. The results, based on Likert-scale surveys and follow-up interviews, indicate that the generated explanations were perceived as more understandable and contextually appropriate compared to visual-only outputs. While the findings are preliminary, they suggest that combining visualization with contextualized text may support more user-friendly and trustworthy model explanations.
- North America > United States (0.14)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)
- Asia > Indonesia > Java > West Java > Bandung (0.05)
- (2 more...)
- Research Report (1.00)
- Overview (1.00)
- Health & Medicine > Therapeutic Area > Nephrology (0.50)
- Health & Medicine > Therapeutic Area > Hepatology (0.49)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Minnesota (0.04)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Therapeutic Area > Hepatology (1.00)
- Government > Regional Government > North America Government > United States Government (0.70)
Upstream Probabilistic Meta-Imputation for Multimodal Pediatric Pancreatitis Classification
Nelson, Max A., Keles, Elif, Tasci, Eminenur Sen, Yazol, Merve, Aktas, Halil Ertugrul, Hong, Ziliang, Bejar, Andrea Mia, Durak, Gorkem, Boyunaga, Oznur Leman, Bagci, Ulas
Pediatric pancreatitis is a progressive and debilitating inflammatory condition, including acute pancreatitis and chronic pancreatitis, that presents significant clinical diagnostic challenges. Machine learning-based methods also face diagnostic challenges due to limited sample availability and multimodal imaging complexity. To address these challenges, this paper introduces Upstream Probabilistic Meta-Imputation (UPMI), a light-weight augmentation strategy that operates upstream of a meta-learner in a low-dimensional meta-feature space rather than in image space. Modality-specific logistic regressions (T1W and T2W MRI radiomics) produce probability outputs that are transformed into a 7-dimensional meta-feature vector. Class-conditional Gaussian mixture models (GMMs) are then fit within each cross-validation fold to sample synthetic meta-features that, combined with real meta-features, train a Random Forest (RF) meta-classifier. On 67 pediatric subjects with paired T1W/T2W MRIs, UPMI achieves a mean AUC of 0.908 $\pm$ 0.072, a $\sim$5% relative gain over a real-only baseline (AUC 0.864 $\pm$ 0.061).
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.90)
- Health & Medicine > Therapeutic Area > Hepatology (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
- Health & Medicine > Therapeutic Area > Hepatology (0.46)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- (4 more...)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Therapeutic Area > Hepatology (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)
- Health & Medicine > Health Care Providers & Services (0.93)
Generative Medical Event Models Improve with Scale
Waxler, Shane, Blazek, Paul, White, Davis, Sneider, Daniel, Chung, Kevin, Nagarathnam, Mani, Williams, Patrick, Voeller, Hank, Wong, Karen, Swanhorst, Matthew, Zhang, Sheng, Usuyama, Naoto, Wong, Cliff, Naumann, Tristan, Poon, Hoifung, Loza, Andrew, Meeker, Daniella, Hain, Seth, Shah, Rahul
Realizing personalized medicine at scale calls for methods that distill insights from longitudinal patient journeys, which can be viewed as a sequence of medical events. Foundation models pretrained on large-scale medical event data represent a promising direction for scaling real-world evidence generation and generalizing to diverse downstream tasks. Using Epic Cosmos, a dataset with medical events from de-identified longitudinal health records for 16.3 billion encounters over 300 million unique patient records from 310 health systems, we introduce the Curiosity models, a family of decoder-only transformer models pretrained on 118 million patients representing 115 billion discrete medical events (151 billion tokens). We present the largest scaling-law study of medical event data, establishing a methodology for pretraining and revealing power-law scaling relationships for compute, tokens, and model size. Consequently, we pretrained a series of compute-optimal models with up to 1 billion parameters. Conditioned on a patient's real-world history, Curiosity autoregressively predicts the next medical event to simulate patient health timelines. We studied 78 real-world tasks, including diagnosis prediction, disease prognosis, and healthcare operations. Remarkably for a foundation model with generic pretraining and simulation-based inference, Curiosity generally outperformed or matched task-specific supervised models on these tasks, without requiring task-specific fine-tuning or few-shot examples. Curiosity's predictive power consistently improves as the model and pretraining scale. Our results show that Curiosity, a generative medical event foundation model, can effectively capture complex clinical dynamics, providing an extensible and generalizable framework to support clinical decision-making, streamline healthcare operations, and improve patient outcomes.
- North America > United States > Alaska (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Rheumatology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- (17 more...)
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Kim, Hyunjae, Sohn, Jiwoong, Gilson, Aidan, Cochran-Caggiano, Nicholas, Applebaum, Serina, Jin, Heeju, Park, Seihee, Park, Yujin, Park, Jiyeong, Choi, Seoyoung, Contreras, Brittany Alexandra Herrera, Huang, Thomas, Yun, Jaehoon, Wei, Ethan F., Jiang, Roy, Colucci, Leah, Lai, Eric, Dave, Amisha, Guo, Tuo, Singer, Maxwell B., Koo, Yonghoe, Adelman, Ron A., Zou, James, Taylor, Andrew, Cohan, Arman, Xu, Hua, Chen, Qingyu
Large language models (LLMs) are transforming the landscape of medicine, yet two fundamental challenges persist: keeping up with rapidly evolving medical knowledge and providing verifiable, evidence-grounded reasoning. Retrieval-augmented generation (RAG) has been widely adopted to address these limitations by supplementing model outputs with retrieved evidence. However, whether RAG reliably achieves these goals remains unclear. Here, we present the most comprehensive expert evaluation of RAG in medicine to date. Eighteen medical experts contributed a total of 80,502 annotations, assessing 800 model outputs generated by GPT-4o and Llama-3.1-8B across 200 real-world patient and USMLE-style queries. We systematically decomposed the RAG pipeline into three components: (i) evidence retrieval (relevance of retrieved passages), (ii) evidence selection (accuracy of evidence usage), and (iii) response generation (factuality and completeness of outputs). Contrary to expectation, standard RAG often degraded performance: only 22% of top-16 passages were relevant, evidence selection remained weak (precision 41-43%, recall 27-49%), and factuality and completeness dropped by up to 6% and 5%, respectively, compared with non-RAG variants. Retrieval and evidence selection remain key failure points for the model, contributing to the overall performance drop. We further show that simple yet effective strategies, including evidence filtering and query reformulation, substantially mitigate these issues, improving performance on MedMCQA and MedXpertQA by up to 12% and 8.2%, respectively. These findings call for re-examining RAG's role in medicine and highlight the importance of stage-aware evaluation and deliberate system design for reliable medical LLM applications.
- Europe > Austria > Vienna (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- (14 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction
Park, Jueon, Park, Yein, Song, Minju, Park, Soyon, Lee, Donghyeon, Baek, Seungheun, Kang, Jaewoo
Drug toxicity remains a major challenge in pharmaceutical development. Recent machine learning models have improved in silico toxicity prediction, but their reliance on annotated data and lack of interpretability limit their applicability. This limits their ability to capture organ-specific toxicities driven by complex biological mechanisms. Large language models (LLMs) offer a promising alternative through step-by-step reasoning and integration of textual data, yet prior approaches lack biological context and transparent rationale. To address this issue, we propose CoTox, a novel framework that integrates LLM with chain-of-thought (CoT) reasoning for multi-toxicity prediction. CoTox combines chemical structure data, biological pathways, and gene ontology (GO) terms to generate interpretable toxicity predictions through step-by-step reasoning. Using GPT-4o, we show that CoTox outperforms both traditional machine learning and deep learning model. We further examine its performance across various LLMs to identify where CoTox is most effective. Additionally, we find that representing chemical structures with IUPAC names, which are easier for LLMs to understand than SMILES, enhances the model's reasoning ability and improves predictive performance. To demonstrate its practical utility in drug development, we simulate the treatment of relevant cell types with drug and incorporated the resulting biological context into the CoTox framework. This approach allow CoTox to generate toxicity predictions aligned with physiological responses, as shown in case study. This result highlights the potential of LLM-based frameworks to improve interpretability and support early-stage drug safety assessment. The code and prompt used in this work are available at https://github.com/dmis-lab/CoTox.
- North America > United States (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Hepatology (0.46)
- Health & Medicine > Therapeutic Area > Nephrology (0.46)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)